Skip to content

Improve chanmon_consistency fuzz target performance#4509

Open
joostjager wants to merge 3 commits intolightningdevkit:mainfrom
joostjager:fuzz-performance
Open

Improve chanmon_consistency fuzz target performance#4509
joostjager wants to merge 3 commits intolightningdevkit:mainfrom
joostjager:fuzz-performance

Conversation

@joostjager
Copy link
Contributor

Reduce per-iteration overhead in the chanmon_consistency fuzz target. Together these achieve a 3-4x speedup. This target has proven crucial for finding bugs, so maximizing its iteration rate is relevant.

The searched-for log message ("Outbound update_fee HTLC buffer
overflow") no longer exists in the lightning crate, so the
from_utf8 + contains check on every log line was pure waste.

AI tools were used in preparing this commit.
Even though DevNull discards the bytes, the formatting work
(SubstringFormatter, fmt::write, from_utf8) was still being done
on every log call. Short-circuit in TestLogger::log via a TypeId
check, which monomorphization resolves at compile time.

AI tools were used in preparing this commit.
@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Mar 24, 2026

👋 I see @TheBlueMatt was un-assigned.
If you'd like another reviewer assignment, please click here.

@codecov
Copy link

codecov bot commented Mar 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.18%. Comparing base (ab31f99) to head (04f3894).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4509      +/-   ##
==========================================
- Coverage   86.19%   86.18%   -0.02%     
==========================================
  Files         160      160              
  Lines      107537   107537              
  Branches   107537   107537              
==========================================
- Hits        92693    92679      -14     
- Misses      12220    12230      +10     
- Partials     2624     2628       +4     
Flag Coverage Δ
tests 86.18% <ø> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@joostjager
Copy link
Contributor Author

I can now see that on CI the speed up is a bit less than on my local machine. Factor 2.5x in CI.

@joostjager joostjager marked this pull request as ready for review March 25, 2026 13:31
@joostjager joostjager requested a review from TheBlueMatt March 25, 2026 13:31
Comment on lines 303 to 310
Ok(chain::ChannelMonitorUpdateStatus::InProgress) => {
let persisted_monitor = mon.clone();
LatestMonitorState {
persisted_monitor_id: monitor_id,
persisted_monitor,
pending_monitors: vec![(monitor_id, mon)],
}
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Behavior change: In the old code, persisted_monitor was Vec::new() (empty) for the InProgress case, meaning a reload via use_old_mons % 3 == 0 would panic on deserialization of the empty vec. Now persisted_monitor is a valid clone of the same monitor that goes into pending_monitors.

This means all three reload paths (%3 == 0/1/2) produce identical behavior for a freshly-watched InProgress channel (the same monitor object), whereas before %3 == 0 was a distinct (crash) path. The crash was arguably a test harness bug (not a useful coverage path), so this seems reasonable, but the field comment ("The latest ChannelMonitor that we told LDK we persisted", line 247) is now inaccurate for the InProgress case — we haven't actually told LDK we persisted it.

Consider either updating the comment to reflect this, or wrapping the field in Option<ChannelMonitor> so the InProgress watch case can be represented as None.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made it option

Comment on lines 325 to +326
let res = self.chain_monitor.update_channel(channel_id, update);
let mon = self.persister.take_latest_monitor(&channel_id);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coverage reduction: The old update_channel independently deserialized the latest stored monitor, applied the update via update_monitor(), re-serialized, and stored the result. This verified on every update that:

  1. The stored monitor data round-trips correctly through serialization
  2. Updates can be successfully applied to a round-tripped monitor

The new code delegates entirely to chain_monitor.update_channel and captures the result from the persister, deferring round-trip verification to reload boundaries only (line 974–981).

If a particular update introduces a serialization issue that a subsequent update happens to mask before the next reload, the new code won't catch it. This is likely an acceptable trade-off for the 3-4x speedup (more iterations compensate for reduced per-iteration verification), but worth noting as a deliberate coverage change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I indeed thought that just serialization is covered well enough elsewhere.

@ldk-claude-review-bot
Copy link
Collaborator

ldk-claude-review-bot commented Mar 25, 2026

I've thoroughly reviewed the entire diff, including checking the underlying ChainMonitor implementation for correctness of the new take_latest_monitor pattern, verifying lock ordering safety, checking ChannelMonitor's clone semantics, and confirming the SearchingOutput removal was dead code.

No new issues found beyond what was flagged in my prior review.

Review Summary

After a thorough second pass examining every hunk in the diff and the underlying implementations, I found no new issues beyond those already flagged in my prior review:

  1. fuzz/src/chanmon_consistency.rs:303-310 (prior) — InProgress case in watch_channel now stores a valid monitor where old code stored empty bytes, changing reload behavior for use_old_mons % 3 == 0.

  2. fuzz/src/chanmon_consistency.rs:322-323 (prior) — update_channel no longer independently verifies round-trip serialization on every update, deferring it to reload boundaries only.

Verified as correct:

  • TestPersister::take_latest_monitor will always find a monitor because ChainMonitor::watch_channel_internal and ChainMonitor::update_channel_internal always call the persister (confirmed deferred=false in all call sites).
  • No deadlock risk: TestChainMonitor::latest_monitors and TestPersister::latest_monitors locks are never held in reverse order simultaneously.
  • ChannelMonitor::clone() creates an independent deep copy (inner is Mutex<ChannelMonitorImpl>, clone locks and clones the impl), so storing clones instead of serialized bytes is semantically equivalent.
  • TypeId::of::<Out>() == TypeId::of::<DevNull>() is sound (Output: 'static) and monomorphizes to a compile-time constant.
  • SearchingOutput/may_fail removal is correct — the searched log message ("Outbound update_fee HTLC buffer overflow") no longer exists in the codebase, so may_fail was always false.
  • Minor: full_stack.rs accumulates monitors in TestPersister::latest_monitors that are never consumed, but memory is bounded by active channel count (HashMap key is ChannelId, insert overwrites).

Previously, TestChainMonitor::update_channel would deserialize the
monitor from stored bytes, apply the update, and serialize it back.
This duplicated the work already done by the inner ChainMonitor,
which applies the update to its in-memory monitor and calls the
persister.

Instead, have TestPersister capture the monitor directly when the
real ChainMonitor calls persist. Serialization is deferred until
reload_node actually needs the bytes, which happens rarely (only on
specific fuzz input bytes that trigger a node restart). This
eliminates redundant deserialization and serialization on every
monitor update, replacing the expensive serialize-on-every-persist
with a cheaper clone.

AI tools were used in preparing this commit.
@joostjager joostjager removed the request for review from TheBlueMatt March 25, 2026 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants